Project: Investigate The Movie Database (TMDb)

Table of Contents

Movie Data Analysis

Introduction

  • This data set contains informationabout 10,000 movies collected fromThe Movie Database (TMDb),including user ratings and revenue.
  • This dataset was generated from The Movie Database API.
  • If you are curious about how this dataset was prepared, the code to access TMDb's API is posted here.
    • Relevant data to be used in this dataset analysis includes the following variables:
    • Original title
    • Main Genres
    • Release date
    • release year
    • Budget
    • Revenue
    • Main actor
    • Month of release

Through out the report, I explore the following questions:

  1. How is the trend of films budget accross the years?
  2. How is the trend of films revenue over the years?
  3. How is the popularity & ratings of movies affected by the Genre?
  4. how does profitablity vary for films released during different months?
  5. how has profitability of making films changed over time?

Data Wrangling

General Properties

Summary:

  1. we read the dataset.
  2. we took a peak on the values of the dataset.
  3. we checked the summary information.
  4. We checked the null values so we can decide what to do with it.
  5. We removed the duplicated rows.

Data wrangling:

Cleaning number 2:

Cleaning number 3:

Cleaning number 4:

Cleaning number 5:

Exploratory Data Analysis

In the pie chart above we can see that the genre of most produced movies were Drama (22.6%), Comedy (21.3%), Action (14.6%), Horror (8.42).

We can see here that movies average budget was fluctuating from 1960 till 1970, then it started increasing till it reached its climax in 1999, afterwards we can observe a steady decline in movies budgets.

putting revenues with budgets gives us a solid perspective on the differences and the relation between their fluctuations.

here we have a bar chart featuring different genres in movies industry, while demonstrating both their popularity, and average vote ratings they get.

Here we can see that most profitable movies are produced mainly in summer, as well as December and November as the holidays come near.

Conclusions

1. Statistics showing that highest movies profits are made in summer, as well as before holidays in November, and December.
2. Drama, Comedy, Action are the most produced movies can be an indication of their popularity, but we need to investigate that furthermore.
3. it is obvious from the below graph that Mega movies with huge budgets were flourishing from the period of 1990 till 2000, then it started to trend down from 33 million average budget in 1999 to 10.4 million in 2014.

4. when observing the relation between films budget and revenues, we find that both of them are directly proportional.

5. We can see that the two most popular Genres are Action and science fiction movies, and although Documentaries are not as popular but it got the highest ratings of average rating of 6.9/10 followed by Music at 6.6/10.

Limitations

1. We have used TMBD Movies dataset for our analysis and worked with popularity, revenue and runtime. Our analysis is limited to only the provided dataset. For example, the dataset does not confirm that every release of every director is listed.

2. There is no normalization or exchange rate or currency conversion is considered during this analysis and our analysis is limited to the numerical values of revenue.

3. Dropping missing or Null values from variables of our interest might skew our analysis and could show unintentional bias towards the relationship being analyzed. etc.